In ICML 2000 Enhancing Supervised Learning with Unlabeled
نویسندگان
چکیده
In many practical learning scenarios, there is a small amount of labeled data along with a large pool of unlabeled data. Many supervised learning algorithms have been developed and extensively studied. We present a new \co-training" strategy for using un-labeled data to improve the performance of standard supervised learning algorithms. Unlike much of the prior work, such as the co-training procedure of Blum and Mitchell (1998), we do not assume there are two redundant views both of which are suucient for perfect classiication. The only requirement our co-training strategy places on each supervised learning algorithm is that its hypothesis partitions the example space into a set of equivalence classes (e.g. for a decision tree each leaf deenes an equivalence class). We evaluate our co-training strategy via experiments using data from the UCI repository.
منابع مشابه
Enhancing Supervised Learning with Unlabeled Data
In many practical learning scenarios, there is a small amount of labeled data along with a large pool of unlabeled data. Many supervised learning algorithms have been developed and extensively studied. We present a new \co-training" strategy for using un-labeled data to improve the performance of standard supervised learning algorithms. Unlike much of the prior work, such as the co-training pro...
متن کاملSemi-Supervised Learning of Mixture Models
This paper analyzes the performance of semisupervised learning of mixture models. We show that unlabeled data can lead to an increase in classification error even in situations where additional labeled data would decrease classification error. We present a mathematical analysis of this “degradation” phenomenon and show that it is due to the fact that bias may be adversely affected by unlabeled ...
متن کاملLarge Scale Text Classification using Semisupervised Multinomial Naive Bayes
Numerous semi-supervised learning methods have been proposed to augment Multinomial Naive Bayes (MNB) using unlabeled documents, but their use in practice is often limited due to implementation difficulty, inconsistent prediction performance, or high computational cost. In this paper, we propose a new, very simple semi-supervised extension of MNB, called Semi-supervised Frequency Estimate (SFE)...
متن کاملSemi-Supervised Classification Based on Classification from Positive and Unlabeled Data
Most of the semi-supervised classification methods developed so far use unlabeled data for regularization purposes under particular distributional assumptions such as the cluster assumption. In contrast, recently developed methods of classification from positive and unlabeled data (PU classification) use unlabeled data for risk evaluation, i.e., label information is directly extracted from unla...
متن کاملCombining Labeled and Unlabeled Data for MultiClass Text Categorization
Supervised learning techniques for text classi cation often require a large number of labeled examples to learn accurately. One way to reduce the amount of labeled data required is to develop algorithms that can learn e ectively from a small number of labeled examples augmented with a large number of unlabeled examples. Current text learning techniques for combining labeled and unlabeled, such ...
متن کامل